Una comparación de metodologías analíticas en el Partido de La Plata
3.1 Resumen Ejecutivo
3.2 Motivos y Objetivos
Trying to calculate more precise estimates of population exposure
Cite the Fathom paper
3.3 Datos, Metodología y Limitaciones
In this analysis, we compare three different methods to estimate the number of families exposed to flooding in the Partido de La Plata. The first is an area-weighted approach, which is based on the assumption that the population is distributed uniformly within each settlement. This method is
The second is uses dasymetric mapping based on the Global Human Settlement Layer (GHSL), which is a global population density raster. In effect, we take the census tract level population provided by the 2022 Argentine census and more precisely distribute it throughout the tract based on the GHSL data, which are at 100m resolution. This is intended to give us a finer-resolution estimate of the population in each settlement.
Our third method proportionally distributes the population per Census tract using the number of buildings in the tract.
Each of these methods has its shortcomings.
The buildings approach ideally would use something like the total volume of the building, given that large differences in building height and density can lead to large differences in the number of people exposed to flooding. Additionally, we would also want to filter out buildings that are not residential.
The Argentine census data are themselves a sample and have not been updated since 2022.
The RENABAP data are, per their documentation, based on linear extrapolation from past data and are therefore flawed.
Ultimately, then, these estimates should be taken as estimates, not precise counts. We report ranges of estimated exposure in order to give a more robust picture of the exposure.
3.4 Resultados
Summarize data by hazard level, by barrio, by cuenca, and by eje
Also summarize by “localidad”, which is how the municipality does this
Include some basic charts
Data are generally intuitive, consistent with previous findings (e.g., from municipality)
Area-weighting appears to give consistently the lowest estimates, suggesting more families may be exposed to flooding than previously estimated
Speculate as to why this is
Remind folks that the question is also one of magnitude: do these estimates change where we think most people are, or the kinds of decisions that we’ll make?
Mostrar el código
import pandas as pdimport geopandas as gpdimport requestsfrom io import StringIOimport boto3import duckdbimport matplotlib.pyplot as pltimport numpy as npimport s2spherefrom botocore.config import Configfrom rasterstats import zonal_statsfrom shapely.geometry import boxUSE_CRS ="EPSG:5349"
Mostrar el código
response = requests.get("https://www.argentina.gob.ar/sites/default/files/renabap-2023-12-06.geojson")renabap = gpd.read_file(StringIO(response.text))renabap_pba = renabap[renabap["provincia"] =="Buenos Aires"]renabap_pba = renabap_pba.to_crs(USE_CRS)peligro_path ="/home/nissim/Documents/dev/ciut-inundaciones/data/la_plata_pelig_2023_datos_originales.geojson"peligro = gpd.read_file(peligro_path)peligro = peligro.to_crs(USE_CRS)# Get the bounds of the peligro layerpeligro_bounds = peligro.total_boundspeligro_bbox = box(*peligro_bounds)# Filter renabap_pba to only include geometries that intersect with the peligro boundsrenabap_pba_intersect = renabap_pba[ renabap_pba.geometry.intersects(peligro_bbox)].copy()# make sure all geometries are validrenabap_pba_intersect = renabap_pba_intersect[renabap_pba_intersect.geometry.is_valid]
3.4.1 Interpolación areal
Mostrar el código
# Ensure both GeoDataFrames have the same CRSif renabap_pba_intersect.crs != peligro.crs: peligro = peligro.to_crs(renabap_pba_intersect.crs)# Get unique hazard levelshazard_levels = peligro["PELIGROSID"].unique()# Initialize result columnsrenabap_with_porciones = renabap_pba_intersect.copy()for level in hazard_levels: renabap_with_porciones[f"porcion_{level}"] =0.0# Calculate total area of each barriorenabap_with_porciones['total_area'] = renabap_with_porciones.geometry.area# For each barrio, calculate intersection with each hazard levelfor idx, barrio in renabap_with_porciones.iterrows(): barrio_geom = barrio.geometry barrio_total_area = barrio_geom.areaif barrio_total_area ==0:continuefor level in hazard_levels: hazard_subset = peligro[peligro["PELIGROSID"] == level]if hazard_subset.empty:continue# Calculate intersection area intersection_area =0for _, hazard_row in hazard_subset.iterrows():try: intersection = barrio_geom.intersection(hazard_row.geometry)ifnot intersection.is_empty: intersection_area += intersection.areaexceptExceptionas e:print(f"Error calculating intersection for {barrio.get('nombre_barrio', idx)}: {e}")continue# Calculate proportion proportion = intersection_area / barrio_total_area if barrio_total_area >0else0 renabap_with_porciones.at[idx, f"porcion_{level}"] = proportion# Create tidy format dataframerenabap_tidy = []for idx, row in renabap_with_porciones.iterrows():for level in hazard_levels: familias_expuestas = row[f"porcion_{level}"] * row["familias_aproximadas"] renabap_tidy.append({"id_renabap": row["id_renabap"],"nombre_barrio": row["nombre_barrio"],"peligrosidad": level,"fam_expuestas_areal": familias_expuestas })renabap_tidy = pd.DataFrame(renabap_tidy)print(renabap_tidy.head())
id_renabap nombre_barrio peligrosidad fam_expuestas_areal
0 2 Malvinas II alta 0.000000
1 2 Malvinas II baja 1.001885
2 2 Malvinas II media 0.000000
3 3 Ferroviario alta 0.000000
4 3 Ferroviario baja 18.936166
3.4.2 Mapeo dasymetrico con datos GHSL
Mostrar el código
import rioxarrayfrom shapely.geometry import box# Load GHSL data with dask chunking for memory efficiencyghsl = rioxarray.open_rasterio("/home/nissim/Downloads/spatial/GHS_POP_E2025_GLOBE_R2023A_54009_100_V1_0_R14_C13/GHS_POP_E2025_GLOBE_R2023A_54009_100_V1_0_R14_C13.tif", chunks={"x": 1024, "y": 1024}, # Adjust chunk size based on your memory constraints)# Reproject to your target CRS with streamingghsl = ghsl.rio.reproject(dst_crs=USE_CRS)# Clip to renabap_pba_intersect bounding box using streamingbounding_box = box(*renabap_pba_intersect.total_bounds) # Create a box from the bounding box coordinatesghsl_clipped = ghsl.rio.clip( [bounding_box], # Use the bounding box as a geometry (wrapped in a list) from_disk=True, # Process from disk to avoid loading entire dataset into memory)import rasterstats# Step 1: Calculate the total GHSL population per barrio popular using zonal statisticsprint("Calculating GHSL population totals per barrio popular...")# Convert to the format expected by rasterstatsgeometries = [geom for geom in renabap_pba_intersect.geometry]# Use rasterstats for vectorized zonal statisticsstats = rasterstats.zonal_stats( geometries, ghsl_clipped.values[0], # rasterstats expects 2D array affine=ghsl_clipped.rio.transform(), stats=["sum"], nodata=ghsl_clipped.rio.nodata,)# Extract the sum valuesghsl_totals = [stat["sum"] if stat["sum"] isnotNoneelse0for stat in stats]# Add the GHSL population estimates as a new columnrenabap_pba_intersect["ghsl_pop_est"] = ghsl_totalsfrom rasterio.features import rasterizeimport numpy as np# Get the reference raster properties from GHSL datareference_raster = ghsl_clippedreference_transform = reference_raster.rio.transform()reference_crs = reference_raster.rio.crsreference_shape = reference_raster.shape[1:] # Get 2D shape (height, width)# Prepare geometries and values for rasterizationgeometries_ghsl = [ (geom, value)for geom, value inzip( renabap_pba_intersect.geometry, renabap_pba_intersect["ghsl_pop_est"] )]geometries_familias = [ (geom, value)for geom, value inzip( renabap_pba_intersect.geometry, renabap_pba_intersect["familias_aproximadas"] )]# Create GHSL population rasterghsl_pop_raster = rasterize( geometries_ghsl, out_shape=reference_shape, transform=reference_transform, fill=0, dtype=np.float32, all_touched=False,)# Create familias aproximadas rasterfamilias_raster = rasterize( geometries_familias, out_shape=reference_shape, transform=reference_transform, fill=0, dtype=np.float32, all_touched=False,)# Step 1: Divide original GHSL by the barrio-level GHSL to get fractional population# Use masking to avoid division on invalid cellsmask = (ghsl_clipped.values[0] !=-200) & (ghsl_pop_raster >0.1)ghsl_fractional = np.full_like(ghsl_clipped.values[0], -200, dtype=np.float64)ghsl_fractional[mask] = ghsl_clipped.values[0][mask] / ghsl_pop_raster[mask]# Step 2: Multiply fractional population by familias aproximadas to get downscaled datamask2 = (ghsl_fractional !=-200) & (familias_raster >0)familias_downscaled = np.full_like(ghsl_clipped.values[0], -200, dtype=np.float64)familias_downscaled[mask2] = ghsl_fractional[mask2] * familias_raster[mask2]# Verify the results - exclude -200 from range calculationsghsl_valid = ghsl_clipped.values[0] !=-200fractional_valid = ghsl_fractional !=-200downscaled_valid = familias_downscaled !=-200# Check that the sum of downscaled familias equals the original familias aproximadastotal_original_familias = renabap_pba_intersect["familias_aproximadas"].sum()total_downscaled_familias = np.sum(familias_downscaled[downscaled_valid])print(f"\nTotal original familias: {total_original_familias:,.0f}")print(f"Total downscaled familias: {total_downscaled_familias:,.0f}")print(f"Difference: {abs(total_original_familias - total_downscaled_familias):,.2f}")# Intersect settlements with hazard zonessettlement_hazard = gpd.overlay(renabap_pba_intersect, peligro, how="intersection")# Create GHSL tidy dataframe with matching structureghsl_tidy = []for idx, row in settlement_hazard.iterrows(): stats = zonal_stats( [row.geometry], familias_downscaled, # your numpy array affine=reference_transform, # get transform from your xarray stats=["sum"], nodata=-200, # use your actual nodata value )[0] ghsl_tidy.append( {"id_renabap": row["id_renabap"],"peligrosidad": row["PELIGROSID"],"fam_expuestas_ghsl": stats["sum"] if stats["sum"] isnotNoneelse0, } )ghsl_tidy = pd.DataFrame(ghsl_tidy)print(ghsl_tidy.head())
Calculating GHSL population totals per barrio popular...
Total original familias: 88,856
Total downscaled familias: 88,680
Difference: 176.00
id_renabap peligrosidad fam_expuestas_ghsl
0 2 baja 0.000000
1 3 baja 14.286419
2 3 media 32.931858
3 4 alta 0.000000
4 4 media 134.000006
3.4.3 Estimaciones según cantidad de edificios
Mostrar el código
def fetch_buildings(geodataframe, temp_file="buildings_filtered.parquet"):"""Fetch building data for a given GeoDataFrame region"""# Get S2 cell and bounds center = geodataframe.to_crs("epsg:3857").union_all().centroid center_wgs84 = ( gpd.GeoDataFrame(geometry=[center], crs="EPSG:3857") .to_crs(epsg=4326) .geometry.iloc[0] ) cell = s2sphere.CellId.from_lat_lng( s2sphere.LatLng.from_degrees(center_wgs84.y, center_wgs84.x) ).parent(10) bounds = geodataframe.to_crs("epsg:4326").total_bounds# Find matching S2 partition s3 = boto3.client("s3", endpoint_url="https://data.source.coop", aws_access_key_id="", aws_secret_access_key="", config=Config(s3={"addressing_style": "path"}), ) partitions = { obj["Key"].split("/")[-1].replace(".parquet", "")for obj in s3.list_objects_v2( Bucket="vida", Prefix="google-microsoft-open-buildings/geoparquet/by_country_s2/country_iso=ARG/", ).get("Contents", []) } parent_id =next(str(cell.parent(level).id())for level inrange(10, 0, -1)ifstr(cell.parent(level).id()) in partitions )# Setup DuckDB and query con = duckdb.connect()for cmd in ["INSTALL spatial","LOAD spatial","INSTALL httpfs","LOAD httpfs","SET s3_region='us-east-1'","SET s3_endpoint='data.source.coop'","SET s3_use_ssl=true","SET s3_url_style='path'", ]: con.execute(cmd)# Export and read back query =f""" COPY (SELECT * FROM 's3://vida/google-microsoft-open-buildings/geoparquet/by_country_s2/country_iso=ARG/{parent_id}.parquet' WHERE bbox.xmax >= {bounds[0]} AND bbox.xmin <= {bounds[2]} AND bbox.ymax >= {bounds[1]} AND bbox.ymin <= {bounds[3]} ) TO '{temp_file}' (FORMAT PARQUET); """ con.execute(query) df = pd.read_parquet(temp_file) df["geometry"] = gpd.GeoSeries.from_wkb(df["geometry"])return gpd.GeoDataFrame(df, geometry="geometry", crs="EPSG:4326")# Usage:buildings = fetch_buildings(renabap_pba_intersect)# Reproject buildings to match the analysis CRSbuildings_proj = buildings.to_crs(USE_CRS)# Step 1: Calculate buildings per settlement-hazard intersectionbuildings_hazard = gpd.overlay(buildings_proj, settlement_hazard, how="intersection")# Count buildings per settlement-hazard combinationbuildings_per_hazard = ( buildings_hazard.groupby(["id_renabap", "PELIGROSID"]) .size() .reset_index(name="buildings_count"))# Step 2: Calculate total buildings per settlement (barrio popular)buildings_settlement = gpd.overlay( buildings_proj, renabap_pba_intersect, how="intersection")total_buildings_per_settlement = ( buildings_settlement.groupby("id_renabap") .size() .reset_index(name="total_buildings"))# Step 3: Merge and calculate ratioshazard_ratios = buildings_per_hazard.merge( total_buildings_per_settlement, on="id_renabap", how="left")hazard_ratios["building_ratio"] = ( hazard_ratios["buildings_count"] / hazard_ratios["total_buildings"])# Step 4: Get total population per settlement and apply ratiossettlement_population = renabap_pba_intersect[ ["id_renabap", "familias_aproximadas"]].copy()# Merge with ratios and calculate population estimatespopulation_estimates = hazard_ratios.merge( settlement_population, on="id_renabap", how="left")population_estimates["estimated_population_hazard"] = ( population_estimates["building_ratio"]* population_estimates["familias_aproximadas"])# Step 5: Create final results with totalsfinal_results = population_estimates[ ["id_renabap", "PELIGROSID", "estimated_population_hazard"]].copy()# Add total population rows (no hazard breakdown)total_pop_rows = settlement_population.copy()total_pop_rows["PELIGROSID"] ="total"total_pop_rows["estimated_population_hazard"] = total_pop_rows["familias_aproximadas"]# Combinefinal_results = pd.concat( [ final_results, total_pop_rows[["id_renabap", "PELIGROSID", "estimated_population_hazard"]], ], ignore_index=True,)# Create buildings tidy dataframe with matching structurebuildings_tidy = final_results[ ["id_renabap", "PELIGROSID", "estimated_population_hazard"]].copy()# Rename columns to match the structurebuildings_tidy = buildings_tidy.rename( columns={"PELIGROSID": "peligrosidad","estimated_population_hazard": "fam_expuestas_edificios", })# Filter out the 'total' rows since we only want hazard-specific databuildings_tidy = buildings_tidy[buildings_tidy["peligrosidad"] !="total"].copy()print(buildings_tidy.head())
id_renabap peligrosidad fam_expuestas_edificios
0 2 baja 3.538827
1 3 baja 33.654237
2 3 media 22.766102
3 4 alta 36.258824
4 4 media 122.964706
3.4.4 Comparación de resultados
Mostrar el código
# Join all three dataframes by id_renabap and peligrosidadfinal_df = renabap_tidy.merge( ghsl_tidy, on=["id_renabap", "peligrosidad"], how="outer")final_df = final_df.merge( buildings_tidy, on=["id_renabap", "peligrosidad"], how="outer")# Impute 0s for NA values in fam_expuestas columnsfam_expuestas_columns = [col for col in final_df.columns if'fam_expuestas'in col]final_df[fam_expuestas_columns] = final_df[fam_expuestas_columns].fillna(0)# Create long format dataframe with aggregationfinal_tidy = []# Add renabap datafor _, row in renabap_tidy.iterrows(): final_tidy.append( {"id_renabap": row["id_renabap"],"peligrosidad": row["peligrosidad"],"metodo": "area","fam_expuestas": row["fam_expuestas_areal"], } )# Add ghsl datafor _, row in ghsl_tidy.iterrows(): final_tidy.append( {"id_renabap": row["id_renabap"],"peligrosidad": row["peligrosidad"],"metodo": "ghsl","fam_expuestas": row["fam_expuestas_ghsl"], } )# Add buildings datafor _, row in buildings_tidy.iterrows(): final_tidy.append( {"id_renabap": row["id_renabap"],"peligrosidad": row["peligrosidad"],"metodo": "edificios","fam_expuestas": row["fam_expuestas_edificios"], } )final_tidy = pd.DataFrame(final_tidy)# Aggregate to get one observation per barrio per hazard level per methodfinal_tidy = ( final_tidy.groupby(["id_renabap", "peligrosidad", "metodo"])["fam_expuestas"] .sum() .reset_index())# Create complete combination of all barrios, hazard levels, and methodsall_barrios = final_tidy["id_renabap"].unique()all_hazard_levels = ["alta", "baja", "media"]all_methods = ["area", "ghsl", "edificios"]complete_combinations = pd.DataFrame([ {"id_renabap": barrio, "peligrosidad": hazard, "metodo": method}for barrio in all_barriosfor hazard in all_hazard_levelsfor method in all_methods])# Merge with actual data and fill missing values with 0final_tidy = complete_combinations.merge( final_tidy, on=["id_renabap", "peligrosidad", "metodo"], how="left")final_tidy["fam_expuestas"] = final_tidy["fam_expuestas"].fillna(0)print(final_tidy.head(10))print(f"Shape: {final_tidy.shape}")
id_renabap peligrosidad metodo fam_expuestas
0 2 alta area 0.000000
1 2 alta ghsl 0.000000
2 2 alta edificios 0.000000
3 2 baja area 1.001885
4 2 baja ghsl 0.000000
5 2 baja edificios 3.538827
6 2 media area 0.000000
7 2 media ghsl 0.000000
8 2 media edificios 0.000000
9 3 alta area 0.000000
Shape: (2907, 4)
Mostrar el código
# Calculate total exposure per hazard level per methodsummary = ( final_tidy.groupby(["peligrosidad", "metodo"])["fam_expuestas"] .sum() .reset_index() .pivot(index="peligrosidad", columns="metodo", values="fam_expuestas"))print("Total Familias Expuestas por Peligrosidad y Método:")print("="*50)print(summary.round(2))
Total Familias Expuestas por Peligrosidad y Método:
==================================================
metodo area edificios ghsl
peligrosidad
alta 3552.36 3646.34 2831.97
baja 7581.05 9911.60 7726.58
media 8555.48 9678.24 8400.36
Mostrar el código
import matplotlib.pyplot as pltimport seaborn as sns# Filter for high exposure (alta peligrosidad)alta_data = final_tidy[final_tidy["peligrosidad"] =="alta"].copy()# Calculate total exposure per barrio across all methodstotal_exposure = ( alta_data.groupby("id_renabap")["fam_expuestas"] .sum() .sort_values(ascending=False))top_25_barrios = total_exposure.head(25).index# Filter data for top 25 barriostop_25_data = alta_data[ alta_data["id_renabap"].isin(top_25_barrios)].copy()# Create range plot showing min, max, and individual pointsplt.figure(figsize=(15, 10))# Define colors for methodsmethod_colors = {"area": "blue", "ghsl": "red", "edificios": "green"}for i, barrio inenumerate(top_25_barrios): barrio_data = top_25_data[top_25_data["id_renabap"] == barrio]iflen(barrio_data) >0: values = barrio_data["fam_expuestas"].values min_val = values.min() max_val = values.max()# Plot range line plt.plot([min_val, max_val], [i, i], "k-", alpha=0.5, linewidth=2)# Plot individual points colored by methodfor _, row in barrio_data.iterrows(): color = method_colors[row["metodo"]] plt.plot(row["fam_expuestas"], i, "o", color=color, markersize=6, alpha=0.8)plt.yticks(range(len(top_25_barrios)), top_25_barrios)plt.xlabel("Familias Expuestas")plt.ylabel("Barrio ID")plt.title("Range of High Exposure Estimates for Top 25 Barrios", fontsize=14)plt.grid(True, alpha=0.3)# Add legendlegend_elements = [ plt.Line2D( [0], [0], marker="o", color="w", markerfacecolor=color, markersize=8, label=method, )for method, color in method_colors.items()]plt.legend(handles=legend_elements, title="Método")plt.tight_layout()plt.show()
Mostrar el código
# Calculate range of exposure estimates for each barrio and risk levelexposure_ranges = []for barrio_id in final_tidy["id_renabap"].unique():for riesgo in ["alta", "baja", "media"]:# Get all method estimates for this barrio and risk level barrio_riesgo_data = final_tidy[ (final_tidy["id_renabap"] == barrio_id) & (final_tidy["peligrosidad"] == riesgo) ]iflen(barrio_riesgo_data) >0:# Calculate range (max - min) exposure_values = barrio_riesgo_data["fam_expuestas"].values exposure_range = exposure_values.max() - exposure_values.min()# Get barrio name for reference barrio_name = barrio_riesgo_data["nombre_barrio"].iloc[0] if"nombre_barrio"in barrio_riesgo_data.columns elsef"ID_{barrio_id}" exposure_ranges.append({"id_renabap": barrio_id,"nombre_barrio": barrio_name,"peligrosidad": riesgo,"exposure_range": exposure_range,"min_exposure": exposure_values.min(),"max_exposure": exposure_values.max(),"mean_exposure": exposure_values.mean(),"std_exposure": exposure_values.std() })exposure_ranges_df = pd.DataFrame(exposure_ranges)# Report distributions of exposure ranges by risk levelprint("=== DISTRIBUTION OF EXPOSURE RANGES BY RISK LEVEL ===")print("="*60)for riesgo in ["alta", "baja", "media"]: riesgo_data = exposure_ranges_df[exposure_ranges_df["peligrosidad"] == riesgo]print(f"\n{riesgo.upper()} PELIGROSIDAD:")print(f" Number of barrios: {len(riesgo_data)}")print(f" Range statistics:")print(f" Mean range: {riesgo_data['exposure_range'].mean():.2f}")print(f" Median range: {riesgo_data['exposure_range'].median():.2f}")print(f" Min range: {riesgo_data['exposure_range'].min():.2f}")print(f" Max range: {riesgo_data['exposure_range'].max():.2f}")print(f" Std range: {riesgo_data['exposure_range'].std():.2f}")# Show barrios with highest ranges top_ranges = riesgo_data.nlargest(10, "exposure_range")print(f" Top 10 barrios with highest exposure ranges:")for _, row in top_ranges.iterrows():print(f" {row['nombre_barrio']}: {row['exposure_range']:.2f} (min: {row['min_exposure']:.2f}, max: {row['max_exposure']:.2f})")# Overall summaryprint(f"\n"+"="*60)print("OVERALL SUMMARY")print("="*60)print(f"Total barrios analyzed: {exposure_ranges_df['id_renabap'].nunique()}")print(f"Total observations: {len(exposure_ranges_df)}")# Calculate overall statisticsoverall_stats = exposure_ranges_df.groupby("peligrosidad")["exposure_range"].agg(["count", "mean", "median", "std", "min", "max"]).round(2)print(f"\nOverall exposure range statistics by risk level:")print(overall_stats)# Show barrios with highest overall ranges across all risk levelsprint(f"\nTop 15 barrios with highest exposure ranges across all risk levels:")top_overall = exposure_ranges_df.nlargest(15, "exposure_range")for _, row in top_overall.iterrows():print(f" {row['nombre_barrio']} ({row['peligrosidad']}): {row['exposure_range']:.2f}")# Create summary plotsplt.figure(figsize=(15, 10))# Plot 1: Box plot of exposure ranges by risk levelplt.subplot(2, 2, 1)exposure_ranges_df.boxplot(column="exposure_range", by="peligrosidad", ax=plt.gca())plt.title("Distribution of Exposure Ranges by Risk Level")plt.suptitle("") # Remove default suptitleplt.ylabel("Exposure Range")# Plot 2: Histogram of exposure rangesplt.subplot(2, 2, 2)plt.hist(exposure_ranges_df["exposure_range"], bins=30, alpha=0.7, edgecolor="black")plt.title("Distribution of All Exposure Ranges")plt.xlabel("Exposure Range")plt.ylabel("Frequency")# Plot 3: Scatter plot of min vs max exposureplt.subplot(2, 2, 3)colors = {"alta": "red", "baja": "blue", "media": "orange"}for riesgo in ["alta", "baja", "media"]: riesgo_data = exposure_ranges_df[exposure_ranges_df["peligrosidad"] == riesgo] plt.scatter(riesgo_data["min_exposure"], riesgo_data["max_exposure"], c=colors[riesgo], label=riesgo.upper(), alpha=0.6)plt.xlabel("Minimum Exposure")plt.ylabel("Maximum Exposure")plt.title("Min vs Max Exposure by Risk Level")plt.legend()# Plot 4: Bar chart of mean ranges by risk levelplt.subplot(2, 2, 4)mean_ranges = exposure_ranges_df.groupby("peligrosidad")["exposure_range"].mean()mean_ranges.plot(kind="bar", color=["red", "blue", "orange"])plt.title("Mean Exposure Range by Risk Level")plt.ylabel("Mean Exposure Range")plt.xticks(rotation=45)plt.tight_layout()plt.show()# Store the resultsprint(f"\n"+"="*60)print("RESULTS STORED IN:")print(" exposure_ranges_df: DataFrame with exposure ranges for each barrio and risk level")print("="*60)
=== DISTRIBUTION OF EXPOSURE RANGES BY RISK LEVEL ===
============================================================
ALTA PELIGROSIDAD:
Number of barrios: 323
Range statistics:
Mean range: 4.26
Median range: 0.00
Min range: 0.00
Max range: 82.10
Std range: 10.41
Top 10 barrios with highest exposure ranges:
ID_1548: 82.10 (min: 28.46, max: 110.56)
ID_801: 62.42 (min: 232.05, max: 294.47)
ID_946: 45.96 (min: 0.00, max: 45.96)
ID_74: 45.39 (min: 0.00, max: 45.39)
ID_1083: 45.26 (min: 0.00, max: 45.26)
ID_24: 44.73 (min: 35.84, max: 80.57)
ID_4465: 38.88 (min: 33.51, max: 72.39)
ID_28: 37.96 (min: 42.48, max: 80.45)
ID_4: 36.26 (min: 0.00, max: 36.26)
ID_4387: 33.97 (min: 33.90, max: 67.87)
BAJA PELIGROSIDAD:
Number of barrios: 323
Range statistics:
Mean range: 11.57
Median range: 0.24
Min range: 0.00
Max range: 164.44
Std range: 22.08
Top 10 barrios with highest exposure ranges:
ID_74: 164.44 (min: 204.53, max: 368.97)
ID_812: 146.54 (min: 416.28, max: 562.82)
ID_59: 118.40 (min: 261.30, max: 379.70)
ID_1065: 115.90 (min: 132.23, max: 248.13)
ID_494: 110.65 (min: 376.49, max: 487.14)
ID_807: 110.06 (min: 228.55, max: 338.61)
ID_975: 95.64 (min: 314.36, max: 410.00)
ID_1075: 75.97 (min: 249.09, max: 325.06)
ID_77: 71.52 (min: 155.94, max: 227.46)
ID_1088: 56.65 (min: 72.76, max: 129.41)
MEDIA PELIGROSIDAD:
Number of barrios: 323
Range statistics:
Mean range: 10.95
Median range: 0.00
Min range: 0.00
Max range: 204.39
Std range: 23.48
Top 10 barrios with highest exposure ranges:
ID_788: 204.39 (min: 306.42, max: 510.81)
ID_559: 191.50 (min: 191.50, max: 383.00)
ID_1492: 101.61 (min: 159.37, max: 260.98)
ID_1548: 82.58 (min: 463.73, max: 546.30)
ID_801: 82.27 (min: 147.00, max: 229.27)
ID_77: 82.09 (min: 163.75, max: 245.84)
ID_322: 78.84 (min: 92.68, max: 171.52)
ID_5681: 76.28 (min: 0.00, max: 76.28)
ID_812: 73.33 (min: 197.73, max: 271.06)
ID_377: 72.14 (min: 78.90, max: 151.04)
============================================================
OVERALL SUMMARY
============================================================
Total barrios analyzed: 323
Total observations: 969
Overall exposure range statistics by risk level:
count mean median std min max
peligrosidad
alta 323 4.26 0.00 10.41 0.0 82.10
baja 323 11.57 0.24 22.08 0.0 164.44
media 323 10.95 0.00 23.48 0.0 204.39
Top 15 barrios with highest exposure ranges across all risk levels:
ID_788 (media): 204.39
ID_559 (media): 191.50
ID_74 (baja): 164.44
ID_812 (baja): 146.54
ID_59 (baja): 118.40
ID_1065 (baja): 115.90
ID_494 (baja): 110.65
ID_807 (baja): 110.06
ID_1492 (media): 101.61
ID_975 (baja): 95.64
ID_1548 (media): 82.58
ID_801 (media): 82.27
ID_1548 (alta): 82.10
ID_77 (media): 82.09
ID_322 (media): 78.84
============================================================
RESULTS STORED IN:
exposure_ranges_df: DataFrame with exposure ranges for each barrio and risk level
============================================================